Target grade: A

Dataset used

I decided to continue with the dataset I found for my Coding 3 final project, that is the UFO sightings data from Tidy Tuesday. If you are unfamiliar with this dataset, please find below a short description (copied from my Coding 3 submission).

The UFO sightings dataset has been featured on Tidy Tuesday on the 25th week of 2023. The dataset comes from the National UFO Reporting Centre (NUFORC), and has been enriched with data from https://sunrise-sunset.org/.

The dataset contains three tables:

The dataset contains 96,429 sightings (most of which are from the US) and 14,417 places.

As most of the observations are from the US, I decided to only concentrate on these.

library(tidytuesdayR)
raw_data <- tidytuesdayR::tt_load('2023-06-20')
## ---- Compiling #TidyTuesday Information for 2023-06-20 ----
## --- There are 3 files available ---
## 
## 
## ── Downloading files ───────────────────────────────────────────────────────────
## 
##   1 of 3: "ufo_sightings.csv"
##   2 of 3: "places.csv"
##   3 of 3: "day_parts_map.csv"
library(data.table)
ufo_sightings <- data.table(raw_data$`ufo_sightings`)
places <- data.table(raw_data$`places`)

rm(raw_data)

ufo_sightings <- ufo_sightings[country_code == 'US']
places <- places[country_code == 'US']

Custom theme

Before doing any plotting, I defined my custom theme used on all of my plots.

theme_custom <- function() {
  theme(
    text = element_text(family = 'serif'),
    axis.text = element_text(color = 'darkgrey', size = 8),
    axis.title = element_text(color = 'darkgoldenrod4', size = 10, face = 'bold'),
    panel.background = element_rect(fill = 'beige'),
    panel.border = element_blank(),
    plot.title = element_text(color = 'darkgoldenrod4', size = 13, face = 'bold', hjust = 0.5),
    plot.subtitle = element_text(color = 'darkgoldenrod4', size = 11, hjust = 0.5),
    legend.position = 'top',
    legend.title = element_text(color = 'darkgoldenrod4', size = 10),
    legend.text = element_text(color = 'darkgoldenrod4', size = 8),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = 'bisque2', linewidth = 0.25),
    axis.ticks = element_blank()
  )
}

Plotting

Plot 1: Evolution of the number of sightings per state since 2000

To visualize the evolution of the number of sightings over time, I added the year of the sightings as a transition state, and I created an animated racing barchart. For better visibility, I only show the top 25 states in each year by the number of sightings.

I added the group aesthetic and the ease_aes so that the transition between states truly resembles a racing barchart.

p1_data <- ufo_sightings[year(reported_date_time) >= 2000, .N, by = .(year(reported_date_time), state)][order(year, -N)]
p1_data <- p1_data[, rank := order(-N), by = year][rank <= 25, ]

library(ggplot2)
library(gganimate)

p1 <- ggplot(p1_data, aes(N, reorder(factor(rank), -rank), fill = state, group = state)) +
  geom_col(alpha = 0.8) +
  geom_text(aes(label = state), hjust = 1.2, vjust = 0.5, size = 5, color = 'white') +
  labs(title = 'US UFO sightings in {closest_state} per state (top 25)',
       subtitle = 'Total sigthings in this year: {ufo_sightings[year(reported_date_time) == closest_state, .N]}',
       x = 'Number of sightings',
       y = 'Rank') +
  theme_custom() +
  theme(legend.position = 'none',
        panel.grid.major.y = element_blank()) +
  scale_fill_discrete() +
  transition_states(year, transition_length = 3) +
  ease_aes('linear')

animate(p1, duration = 24, fps = 30)

Plot 2: Scatterpie map visualization of day parts’ share of total number of sightings per state

For the second visualization, I wanted to show whether there is a difference between states in which part of the day they report UFO sightings. To do this, I plotted a piechart for each state on a map. As there were too many day parts in the dataset, I grouped them into two categories: before and after noon. I downloaded a US state GeoJSON from GitHub, and I also added custom map tiles as a background.

download.file('https://raw.githubusercontent.com/PublicaMundi/MappingAPI/refs/heads/master/data/geojson/us-states.json',
              'us.geojson', mode = 'wb')

library(sf)
## Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
us_map <- st_read('us.geojson')
## Reading layer `us' from data source 
##   `D:\EGYETEM\CEU\OneDrive - Central European University (CEU GmbH Hungarian Branch Office)\DV3\HA2\us.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 52 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -188.9049 ymin: 17.92956 xmax: -65.6268 ymax: 71.35163
## Geodetic CRS:  WGS 84
library(maptiles)

map_tiles <- get_tiles(us_map, apikey = 'add_your_own_stadia_api_key_here', zoom = 4,
                       provider = 'CartoDB.PositronNoLabels', crop = TRUE)
## |---------|---------|---------|---------|=========================================                                          
library(tidyterra)
## 
## Kapcsolódás csomaghoz: 'tidyterra'
## The following object is masked from 'package:stats':
## 
##     filter
library(scatterpie)
## scatterpie v0.2.4 Learn more at https://yulab-smu.top/
ufo_sightings[, after12 := as.character(day_part %in% c('night', 'nautical dusk', 'afternoon',
                                           'astronomical dusk', 'civil dusk'))]
ufo_sightings[after12 == 'TRUE', after12 := 'After_12AM']
ufo_sightings[after12 == 'FALSE', after12 := 'Before_12AM']

p2_data <- merge(ufo_sightings, places, by = c('city', 'state', 'country_code'))[, .(.N, lat = mean(latitude), lon = mean(longitude)), by = .(state, after12)]
p2_data_wide <- dcast(p2_data, state + lat + lon ~ after12, value.var = "N")
p2_data_wide <- p2_data_wide[, .(lat = mean(lat), lon = mean(lon), After_12AM = sum(After_12AM, na.rm = TRUE), Before_12AM = sum(Before_12AM, na.rm = TRUE)), by = state]

library(ggthemes)

ggplot() +
  geom_spatraster_rgb(data = map_tiles) +
  geom_sf(data = us_map, fill = NA, color = 'black') +
  geom_scatterpie(
    aes(x = lon, y = lat, group = state),
    data = p2_data_wide,
    cols = c("Before_12AM", "After_12AM"),
    pie_scale = 0.75
  ) +
  labs(title = 'Share of sightings before and after noon in each US state') +
  scale_fill_excel_new(theme ='Paper', name = 'Time of day') +
  theme_custom() +
  theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
  )
## <SpatRaster> resampled to 500456 cells.

Plot 3: Spatial distribution of average UFO sighting duration in the US

In the third plot, I wanted to show whether there are spatial differences between the average sighting durations. However, I felt that doing this on the city level would be too granular, but aggregating on the state level would be too high-level. So, I opted for a hexagon plot, overlayed on a map. The color of each hexagon represents the average sighting duration of those observations that fall within the boundaries of the hexagon.

p3_data <- merge(ufo_sightings, places, by = c('city', 'state', 'country_code'))[, .(avg_duration_min = mean(duration_seconds) / 60), by = .(latitude, longitude)]

ggplot() +
  geom_spatraster_rgb(data = map_tiles) +
  geom_sf(data = us_map, fill = NA, color = 'black') +
  stat_summary_hex(
    data = p3_data,
    aes(x = longitude, y = latitude, z = avg_duration_min),
    fun = mean,
    bins = 25,
    alpha = 0.8
  ) +
  scale_fill_viridis_c(name = "Avg. duration (min)", option = "mako") +
  labs(title = 'Spatial distribution of average sighting duration in the US') +
  theme_custom() +
  theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
  )
## <SpatRaster> resampled to 500456 cells.

Plot 4: UFO sightings tracker

The 4th plot is an animation of the year with the most sightings in the US: 2014. To set the atmosphere, I used dark themed map tiles as a background. The animation goes through every day of 2014 and shows on a map where there were sightings on that day.

dark_map_tiles <- get_tiles(us_map, apikey = 'add_your_own_stadia_api_key_here', zoom = 4,
                       provider = 'CartoDB.DarkMatterNoLabels', crop = TRUE)
## |---------|---------|---------|---------|=========================================                                          
p4_data <- merge(ufo_sightings, places, by = c('city', 'state', 'country_code'))[year(reported_date_time) == 2014, ]

p4 <- ggplot(p4_data) +
  geom_spatraster_rgb(data = dark_map_tiles) +
  geom_sf(data = us_map, fill = NA, color = 'grey') +
  geom_point(aes(longitude, latitude), color = 'yellow') +
  transition_states(as.Date(reported_date_time)) +
  labs(title = 'US UFO sightings on {closest_state}',
       subtitle = 'Total no. of sightings on this day {p4_data[as.Date(reported_date_time) == as.Date(closest_state), .N]}') +
  theme_custom() +
  theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
  )
## <SpatRaster> resampled to 500456 cells.
animate(p4, duration = 30, fps = 15)

Plot 5: Interactive boxplot of sighting duration distribution by shape

In this plot, I visualized the distribution of sighting durations by shape in the US. However, I had to use a log transformed scale for the duration so that the distributions are more visible. To help users get to know the actual values, I made the plot interactive with ggiraph: hovering over the boxplots reveals certain key metrics of the distributions.

library(ggiraph)

summary_stats <- ufo_sightings %>%
  group_by(shape) %>%
  summarise(
    median_duration = median(duration_seconds),
    mean_duration = mean(duration_seconds),
    min_duration = min(duration_seconds),
    max_duration = max(duration_seconds)
  )

ufo_sightings <- merge(ufo_sightings, summary_stats, by = "shape")

p5 <- ggplot(ufo_sightings, aes(x = factor(shape), y = log(duration_seconds / 60 + 1/60))) +
  geom_boxplot_interactive(aes(
    fill = shape,
    tooltip = paste(
      "Shape:", shape,
      "<br>Median:", round(median_duration/60, 1),
      "<br>Mean:", round(mean_duration/60, 1),
      "<br>Min:", round(min_duration/60, 1),
      "<br>Max:", round(max_duration/60, 1)
    ),
    data_id = shape
  )) +
  scale_fill_viridis_d(name = "UFO Shape") +
  labs(title = "US UFO sighting duration distribution by shape",
       subtitle = "Hover over the boxplots to see detailed summary metrics!",
       x = "Shape", y = "Duration in log minutes") +
  theme_custom() +
  theme(legend.position = 'none',
        axis.text.x = element_text(angle = 90))

girafe(ggobj = p5)

Plot 6: An explorative map of the most recent UFO sightings

In the last plot, I wanted to enable users to explore themselves the most recent UFO sightings. For this, I plotted out each sighting in 2013 on a map, and added certain pieces of information to the tooltips. I made this visualization using plotly, but sadly, this package does not yet support custom map tiles - so I could only plot out the GeoJSON file.

library(plotly)
## 
## Kapcsolódás csomaghoz: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
p6_data <- merge(ufo_sightings, places, by = c('city', 'state', 'country_code')) %>%
  filter(year(reported_date_time) == 2023) %>%
  mutate(
    text = paste("City:", city,
                 "<br>Duration (min):", round(duration_seconds / 60, 2),
                 "<br>Shape:", shape,
                 "<br>Date & time:", reported_date_time
                 )
  )

p6 <- ggplot(p6_data) +
  geom_sf(data = us_map, fill = 'dodgerblue4', color = 'lightgrey', linewidth = 0.5) +
  geom_point(aes(longitude, latitude, text = text), color = 'yellow', size = 0.25) +
  labs(title = 'US UFO sightings in 2023<br>Hover over the points to reveal details!') +
  theme_custom() +
  theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
  )
## Warning in geom_point(aes(longitude, latitude, text = text), color = "yellow",
## : Ignoring unknown aesthetics: text
ggplotly(p6, tooltip = 'text')

These six visualizations conclude my final project for Data Visualization 3.